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ABSTRACT 
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community to higher order thinking skills are examined. Through a 
detailed description, a model assessment development process is 
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grounded in concerns for subject matter validity, feasibility, and 
credibility. An illustrative example of the development of 
sophisticated scoring approaches to be used to assess the quality of 
content in a student's writing leads to a discussion of educational 
Indicators as a way of providing a context for results of new 
measures • A content scoring rubric is described, which Incorporates 
use of prior knowledge, principles, facts and events, 
problems/premises, text information, interrelationships, and 
misconceptions. An example of the development of new curriculum 
indicators is Included to demonstrate the complexity and utility of 
such efforts. Educational indicators, characteristics of Indicator 
systems, and indicators as the context for higher order thinking are 
considered. An 88-ltem list of references is Included. (SLD) 
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Introduction 



It is now Impossible to deny that testing and other approaches to the 
assessment of achievement represent one the most widespread and powerful 
appXhe attempting to control the quality of schooling. It Is similarly clear that 
the impac of test? In the service of accountability Is not unbrW ed good Critics 
contend that sjch tests may be shallow, they may be corruptible, or they rnay be 
?nco?rect stpa^/cT^^^ Sanders, 1989rLlnn, 1989; Burstein, 1989; Baker, 1989; 

989) Wis chapter will attempt to place In context the renewed attention 
io assessments which attempt to capture more complex aspects of educational 
attainments of students. 



How is Higher Order Thinking Conceived and Measured? 

Higher order thinking has been conceived both In terms of analyses of 
Intellectual processes and of task characteristics. 

Intellectual Attributes of Higher Order Thinking 

In simplest terms higher order thinking measures Include all Intellectual tasks 
that call for more than Infofmatlon retrieval. Any transformation on Information Is 
by definition "higher order' thinking. Early vvri\ers who took the approach of 
detailing Intellectual processes and Illustrative tasks '"^uded Bloom CBloom^ l^ 
and Galne (198S). Operating from an assessment perspective Bloom and h s 
?Xa£ues formulated an analysis that popularized the term -higher order' since the 
SSevelsTS ven^^ Oft/ecr/ves (comprehension, 

JJpTcatiori analysis, synthesis, and evafuatlon) provided an operational de^^^^^^^^^^^^ 
th?t influenced test developers and curriculum designers for years. Gagnes anal^^^ 
of Intellectual processes required by tasks, developed from a learn ng and training 
perspedlve, was of similar influence and was also frequently construed to have a 
Echica character. Both analyses rest upon the Inference of transformation and 
construction processes from tasks, and served as Important precursors to many 
current cognitive analyses. 

Other formulations of higher order thinking derive from the general ^^^^^ 
solvinc literature, and emphasize such task components as problem Identification 
nd solution testing. Thes'e may treat problem solving enher« sub^ect^ma er 
rfnmain Indeoendent task, akin to genera critical thinking (Ennis, 1987), or as 
dSent up^^^^^^^ content domains. Higher order thinking can also take 
fheCrof metacogriitive skills, such as planning and self checking TT^esesWm 
mav be either Independent of or embedcfed In the subject matter task to which 
meHfeTpI ei to^ formulations focus on InteMectual processes as they are 
unique toTr^^^ subject matter domains (e.g., the use of appropriate rhetorical 
structure In written composition). 

Higher Order Assessment Tasks 

On the assessment plane, It Is common to asalbe higher order thinking to 
relatively globa and Turfac? features of the tasks themselves Certain task a tributes 
arc thought to require higher order processing. For example, "open-ended ' 
que tlonf desallS tasks %r which many answers are w^^^^ h'fol^^In^o 
higher order processes are necessary to respond (California Sta c Departrn^^^^^ of 
Education 1989) Obviously, open-ended questions can also solldt a range of 
["tlin M tJh ^I facts you know abo;t dinosaurs^ In the same way, 
almost all student constnictlons, such as those Included In a wr ting or other 
portfolio, are assumed to be examples of higher order processes. 
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The concept of performance assessment Is also partially connected to the 
measurement of higher order thinking (Baker, O'Nell, & Linn, In press). Present 
formulations of performance assessment Involve two major dimensions: the 
recording of ephemeral behavior, such as electronic trouble shooting strategy or the 
conduct of a biology experiment, as well as the rating of resulting products or 
solutions. It Is arguable that records of performance necessarily require higher order 
thinking. Consider In particular the large proportion of performance assessments In 
Industrial or military contexts requiring the respondent to use specifically practiced 
operations and procedures In tasks of very narrow boundaries. One variation on 
performance assessment that may have a higher order component has renewed 
currency In public education. The approach, called 'seamless* measurement, 
authentic assessment (Burstall, 1989) or blended assessment (Carlson, 1989), Is based 
on the need for stronger validity in terms of both learning and Instructional 
analysis. Tills approach focuses on the use of complex activities, such as 
experiments on the absorbancy of paper towels (Shavelson, 1989) or the 
preparation of analytical papers In history. (Baker & Clayton, 1989), to make 
judgments about achievement. Such tasks may take longer than the usual single class 
period to aaompllsh, and may be performed Individually, cooperatively, or Jn a 
team environment. These activities, even when used for assessment, share r.iany 
properties with good Instructional lessons. Including the arousal of curiosity In 
students, and the fact that teachers themselves may be rating the quality of students 
performance. In many of these examples, both student processes and producti may 
be judged. Nonetheless, measures of higher order thinking have significant cosvs. 
Among them are the restricted number of tasks that may be sampled In a fixed 
period of time, the administrative and practical Intricacies of reliable and accurate 
scoring, and the aedlblllty of results for a society used to numbers, stanlnes, and 
norms. Alternatively, their benefits Include Inaeased validity and potential for 
positive currlcular and Instructional Impact. Another Important benefit may be the 
reduction of the salience of ceremonial testing, and Its attendant costs of targeted 
test preparation. 

Again, the avoidance of a multiple choice format or the presence of a label of 
open-ended or "authentic" are not sufficient to assure that these mea:ures arc 
actually assessing higher order processes. Details of students' actual Instructional 
histories and the role of test administrators can alter to rote tasks measures ^.hat 
appe&r to tap the highest reaches of human thought. This reality underscores a 
m-:Jor point of this paper: that any measurement process, higher order thinking in 
particular, must be understood both In the light of other available Information and 
the Intended uses of Implementation. 

In the next section, the multiple sources for the higher order thinking 
movement will be discussed. Subsequent sections will consider the policy context 
for measurement, Including a description of the multiple Indicator approach. They 
will also provide detailed examples of both a higher order thinking measure, and an 
example which might Inform Its understanding In a policy context. Overall, this 
chapter will consider the multitude of Issue: relevant to understanding the next 
wave of achievement measurement. 



Impetus for the Measurement of Higher Order Thinking 

The potential uses of tests condition their development In the area of 
testing In general, and higher order testing In particular, there arc at least three 
major sources for new measures: research, policy, and practice. All three of these 
have converged on the need of measuring nigher order processes. Let us consider 
each In turn. 



erIc 



^5 



Scientific Research 



The research context impacts measure development in two related ways. 
First, theoretically driven targets of inquiry— in other words, new constructs— help 
us reconceptualize our thinking about common processes. Examples of such 
constructs are: mental model (Norman, 1983; Collins & Centner, 1984) advance 
organizer (Ausubel, 1960), "bug' (Brown & Burton, 1978), or metacognltion 
(Meichenbaum & Asarnow, 1978). These constructs are based largely on cognitive 
science perspectives and provide frameworks to influence the design of human 
performance measures. Our community has begun to become interested in 
measuring problem-solving processes, In assessing group performance to capture the 
social meaning of certain tasks, and In measuring alternative representation modes 
for student knowledge, all based on scientific research. Still unresolved are the 
relative roles of process and product In the assessment of higher-order thinking, the 
place of assessment outside or within subject matter domains, and the Importance of 
new conceptions of transfer to assessment. 

Clearly the topics of research are Influenced not only by what Is 
theoretically Interesting, but by what Is socially Important as well. Measurement 
fod are also similarly influenced by social goals. One way to keep score on sodal 
Importance is to track the availability of research funding. For example, the 1988 
VIncennes Incident !n the Persian Gulf where an Iranian airliner was mistakenly 
shot down predicted renewed research attention to measuring task performance of 
teams under high stress conditions, and such predictions have been verified (see, for 
example, U.S. Department of Defense, 1989). Any choice In R&D goals can only be 
Judged In the light of the trade-offs In support for other goals. A case in point Is the 
decmphasis on equity Issues In the political arena during the last administration. 
This choice had clear consequences for acceptable assessment fod, consequences 
which no doubt benefitted more elitist targets, such as content-focused studies of 
higher cognitive performance. 

A second, related Influence from the domain of research derives from the 
actual methods used In research Itself. When research approaches emphasized 
behavioral constructs, and quantitative methods with summary estimates of large 
groups held methodological currency, psychometric research and test development 
marched In support. Thus, there are dear implications for testing Inherent In the 
shift during the last fifteen years to cognitive psychology. With its relegltimating of 
inferences from small samples (Shulman, 1986), self report data (Ericsson & Simon, 
1984) and other practices drawn from the ethnoniethodology side of research 
(Lcvlne, 1988), we should expect charaderistics of tests to develop accordingly (e.g., 
limited task sampling). Collateral psychometric developments (Bock, 1987), for 
example, permit this transition by generating credible quantitative estimates for 
measures based on restrldlve task sampling. 

Educational Policy 

The major Impetus for identifying higher order thinking as an accountability 
target grows from policy, not sdence, however. Recent surveys of educational 
reform (PIpho. 1988) confirm that tests are used as policy Instruments with 
Increasing frequency since the publication of A Nation At Risk (National Commission 
on Excellence In Education, 1983). Tests arc seen, correctly, as at least one 
operational way to communicate standards, and their uses have proliferated, 
extending beyond requirements for the award of high school diplomas to exit tests 
from kindergarten, for grade-to-grade promotion, merit diplomas and teacher 
rccertification. If tests are so important, attending to theu focus becomes more 
altlcal. Reasons for emphasizing higher order thinking in school programs and 
assessment were Identified In reports by prestigious and powerful groups. Three 
ideas recur often in these reports. 
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The erosion of economic competitiveness. The International trade 
Imbalance, the editorials about the second tier status of America, and the visible 
Incursion of foreign wealth continue to raise sharp anxieties. These problems have 
been directly attributed to the failure of schools to prepare an adequately educated 
workforce In reports by powerful and prestigious groups such as the National 
Academy of Sciences, National Academy of Engineering Institute of Medldne 
(1984). The comparison group of Interest Is obviously the Japanese, and the subject 
matters of concern are mathematics and science. The lEA report on US students' 
performance In mathematics (McKnIght et al., 1987), coupled with the emphasis on 
formal testing In Japanese schools, strengthens ^or many the argument that testing 
will provide a way to Improve student performance (National Governor s 
Association, 1986). One might note, with some Irony, that the tests used In 
Japanese schools consist principally of rote Items. Thus, It Is probably not the test 
content Itself that provides Japan with Its educational competitive edge, but rather 
the relatively permanent consequences of test failure. 

The failure of schools to educate disadvantaged students to a level that 
permits their full and productive participation In our society. Renewed 
awareness of the problems of disadvantaged, principally black and Hispanic, youth Is 
fueled by media reports of Inaeased violence, gang participation, an alarming rate 
of teenage pregnancies, and rampant drug Uaffic and use. National attention Is also 
focused on adult literacy (see, for example. House & Madula, 1987; Kirsch & 
Jungeblut, 1986; Bain & Herman, 1988; Sticht, 1987; MDC, Inc., 1985). 

The inability of the educational system to prepare students for change. 
Here, the grab bag of societal ills Is attributed to failure of Individuals and 
organizations to adapt to change. Issues such as industrial redevelopment, 
unsuccessful |ob displacement and retraining, Inaeasing reliance on technology, and 
illegal Immigration are lumped together (see Cohen, 1987; Carnegie Forum on 
Education and the Economy, Task Force on Teaching as a Profession, 1986; Goodlad, 
1984; Sizer, 1984). 

The Indisputable fact 1; that these concerns stimulated a rash of specific state 
and local reform efforts (Bennett, 1988) and that the focus of these reforms 
substantially has been standards and accountability. How has educational practice 
reacted? 

Educational Practice 

Let's call educational practice the composite of what local administrators and 
teachers do to Implement policy, to get curriculum enacted, and to teach 
appropriately In classrooms. Systematic educational models link practice and 
requirements. These models emphasize the centrallty of formulating goal 
statements, the consequent design and Implementation of oirrlajla and 
Instructional practices, and the subsequent aeatlon, administration, and 
Interpretation of measures to assess the attainment of such goals. In real life, these 
relationships are less rational and orderly. From a period of time when the text 
materials were the dominant fact of life, we have moved to a period when what Is 
tested 1$ of equal or more Importance. At this juncture, practitioner views are 
predictably divided. Many allies see testing as » preemptive, coerdve. Implicit goal- 
setting device, whereas other assessment proponents see only good management. 
The pioblem used to be to understand whether the major Impact of formal testing 
programs was to jntajiiifi the Impact of Innovation or to lifomi educational programs 
themselves (Baker, 1989). Most policy analysts would probably now deave to the 
reform fundlon of tests. 



If we believe that assessment now sets goals, we need to examine how 
assessments and resources are strudured to allow teachers to improve their teaching. 



Tn An «n tests should b€ reported as diagnostlcally connected to curricula, so that 
?Mchers cln art uZ thenflnstrucHvely KnowtSg that a class fal s Into the Iow« 
Quarme Mc^ vZwc t^^ teacher thai knowing ? hat domains of content and skills 
2cJd ImDrovemeS f^^ riven children. A related aspect to the format ve evaluation 
use oneKTker 974), Is the Interplay between the availability of texts and 
o currk^^^^^^ and their relationship to both goals and measured 

?u^comes m ^ a matteJof malor focus In some states and 

dK son^Jmcfu"^^^^ label of •curriculum a"gnment/ To aUg^^^^^^^^ 
mMn« that attemots are made to match goals, classroom Instructional resources, ana 
^sts men^Tfist Is t?^e only clement In the alignment set with clear sanctions 
s ocial with It me end result of alignment processes ^^P ifes eve^^^^^^^^^ 
currlcular Influence of the test. Depending "PO»^,^here you sit Jdc^^^^^^ and 
organizationally, this kind of efficacy may be good or evil. A curriculum gets 
narrowed or focused, take your pick. 

For tests to Impact Instruction, teachers need to be able to leach higher 
order Ms In hi fS^pI^^ as well as to understand test '«uUs t^^ 
in<f r irtlnnal stratecles Most Studies of classroom practice suggest thrt teachers ao 
no use many practices, although there Is evidence that they 

?ai^ b"lSX do so (P^^^^ Unfortunately, the knowledge base upon 

which teaSralnlng has drat n I not rich in providing dear strategies and 
rechnlQueffor eS Furthermore, teachers have limited management 

opSoTfor dealfn^vJlth Inaeaslngly '"dMduallzed learning 'MS"/,^'^;^^ 
again constrained V habit and standard s*ool organlatton A related^I^^^^^^^^^ 
tlachers* ability and Interest In using test data to revise ^nf^^f*'*'"' /X^^^^ °' 
eacher's use of test Information show that neither curricula nor teachers 
\n^uVlZ\ VimL could cope with the i^vel of accuracy de^H, and^^^^^^^ of 
nfnrmatlon tests could orovldc (Dorr-Brcmme & Herman, 1986). Furtnermore, 
leI?hTrs ar? o used to^?s? Information that Is Irrelevant to the way they perceive 
MchfnB that su(J Information Is Ignored without the pressure of accountability 
sfnrtlom AtteiSpts to m^ order thinking assessments appear to be more 

Ike cSoorn aXt le^^ and less foreign to the teaching environment should help. 
Kerr^oTsoS^^^^ down the roa\ the utility ^^^^^^^^ 
1988^ and enriched and we I-supported computer Interventions ^aker, Herinan, « 
Ge^U^ im needed support" But computer magic Is not yet 

S d 7e'^d. may be'ttS opportunity provided by t^^^^^^^ 
reprofesslonallzatlon and restructuring movements. TJiese J'"^?! TeaXr 
nrSwpf to teachers more than touch on the topic of higher order thInWng. Teacher 
mrnl^tlons may w negotiate accountability requirements In the service of 
fc^slng W goals for children anJ je^pl^,^^^^^^^^ 

themselves The recent activities on the National Board for P^ofesslona Teaching 
SlaTdards (brn^^^^^ Corporation of New York, 1989 suMCSts that '^certification 
fssessSs^^n st^onil^^ In this direction. Thus, tfiJre Is some srnall chance 
fSt tf^e !onmgi% science, policy and practice mav actually result In 

erious refSlon schooling. If otheVstrategles are selected, teachers will have 
Ihe proK^^^^^ on policy expectations without changes In their 

preparation, resources, or commllmcnl . 

To sum up the points of this section, major forces have converged, 
DromollnVthe higher order thinking assessment. In the research and policy 
communlHes In (Srtlcular, and have^ralsed the stakes for school accountability one 
more round 

Prospects for Success for Policy Driven Higher Order Tlilnking Assessment 

What has been recent testing experience? To what extent have P/jlor 
testing reforms been successful? AnlweiTto these quesUons may help us to predict 



ERIC 



' 8 



If tests of higher order thinking will achieve their Intended policy goals. Opinion Is 
mixed Case studies of testing reforms conducted In five different localities were 
reported by Ellweln and Glass (1987). It Is their contention that testing programs 
possess largely symbolic value, because the educational system finds a way around 
standardized testing requirements. Cut scores get lowered, and other safety nets 
are strung up to protect Individuals who don't succeed. Shepard, Kreltzer, and Graue 
(1987) In an analysis of the Texas teacher recertiflcatlon test, reached similar 
conclusions, as did Rudner and Baker (In press) when they reviewed statewide 
teacher testing programs. Others are studying the concomitants of more stringent 
standards and tVsting programs. Of Interest are the differential effects on minority 
students, effects which may Increase their drop-out rates (Catterall, 1987). 

How much testing of higher order thinking Is going on? Tills question Is not 
easily answered because of definitional problems and the aforementioned potential 
transformation by Instruction of higher order skills Into memorized procedures. In 
order to determine nature and distribution of higher order thinking Items In 
mandated state programs, a study was conducted by Burstein and others (Baker, 
Burstein, Aschbacher, fit Keesling, 1985) to document the targets of assessment. Of 
the states that^^^^ contracted specially for stale testing measures at 

that time, very few were found to use tests that pushed far beyond tasks of 
Information retrieval or relabeling. Similarly, very few tests focused beyond the 
basic skills and assessed knowledge and skills In subject matter areas sudi as sodal 
S^or science. Mathematics assessment dealt largely with arithmetic We^^^^ 
of specific subsequent changes in testing programs in Texjs. ^ ' 

and NTew York that are designed to address higher order thinking concerns, and 
because these states often provide leadership, we will expect to see more tests 
labeled "higher order' In the near future. 

So higher order testing is on the way. What should we anticipate from 
investment In the measurement of higher order thinking skills? If one believes 
Ellweln and Glass (1987) higher order thinking may become llllle more than a 
symbonc flag around which to rally. One reason used to |usUfy the deve opmcnt of 
uchmea ures 1$ that in time they will be able to detect the effects of reforms such 
as tougher curriculum standards (e.g., more time spent In class and more courses In 
particular subject matters). If testing results arc unacceptable (I.e., not enough 
higher level performance Is demonstrated) and political s akes are high enough, 
Ellweln and Glass and others believe that the -system' will Jnd a way to blur 
outcomes to make them palatable to an expectant public Such ways Include 
changing pass scores, using less difficult Items nominally to measure a higher order 
obledlve, or countenancing forms of explicit practice In Inslructlon, so to change 
higher order skills functionally Into retrieval behaviors, niidt practices, such as 
falsifying results, also occur. At minimum, we might worry that the term higher 
order will be misappropriated and used to desalbe less challenging skills. 

Some relief may come from the focus on performance based, activity 
structures as sources for a new regime of 'standardized tests. Allied with 
opporttinilles provided by the restructuring movement, and the leadership of 
Irinuentlal states, the future of testing may be brighter than Its history. 

This entire set of events may be further perturbed by our conditioning to 
expect reports about achievement In the simplest terms. Fundamental 
mlSemaJidings about test norms and about the validity of s^^gle 5<»L« fummaries 
persist (Cannell, 1988a, 1988b). A minl-industiy has developed In educatlona 
agencies and private firms to make test scores understandable to parents and the 
pffi, leading us to simplify, cummartee, and perhaps work against the real goals we 
Kavrin no other field ivith a dosely coupled scientific base are ma or results 
prlndpally targeted to the least sophisticated consumer. However, the Impetus of 
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.1, n.iilltv Indicators efforts may have something to contribute to the 

co.rnuJrt^-s^syrr.i;?^ 

be weak, what Is left Is to massage but the measure? 

Assessing Higher Order Thinking: A Research-Based Example 
Movinc awav from general sources and predictions, the next section of this 

Kofn /n ihink of^^^^^ as the assessment of deep understanding. We also 

other subject fields. 
Task Area 

wp cPlPrtPd the area of history and focused on the pre ClvlI War era of 
WcaThM 

measures In contrast to the ""'^f '""xS wp^^^^^ to use piimary source materials 

constraints (Baker, Freeman, & Clayton, 1989). 

Thus we are attempting to explore the construct of 'deep understanding*; 
our pre^nt'deflnUlon'lsTeU^^ ancT relates to the following components and 
attendant theoretical bases: 

1 Dceo understanding requires the activation of thinking processes applied 

S S S as aX cor^ trSdIon In the knowle'dge acquisition process, 

?re bora^^^^^^^^^^ of r^tttTth^' clte^^^^^^^^ 

prior knowIedM (see Brown & p^p one, 1986, and the comprehensive 

review by Segal, Chlpman, & Glaser, 1V85). 

0 nppn understanding may Involve qualitative differences between expert 
2. peep ""^""^"^^^^^ Chi & Glaser. 1980). Expert understanding of 

?o"pIa in^hi^al^? m^^^^ ^?emls'e' driv"^^^^^ W^*' 
no%«ldm^^^^^^^ componentlaT (Baker, 

Freeman, «f Clayton, 1989). 

1 The author wish« lo lhank her colleagues on tlw project, particularly Marie Freeman. Serena 
^mX^nff^. sS^i ChZ DavM Neiml. Reggie SUles. and Pam Aschbacher. 
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3 TTie expression of deep understanding depends upon a sophisticated 
Interplay Smong three types of knowledge: strategic, procedural and 
content (or dedaratlve). Strategic knowledge represents top-level 
Sedge of the ina)o attrlbutis and relationships of a discipline (I.e the 
«tenno which interpretation of events In history Is context driven and 
me result of the Interrdatlonshlps In complex factors sf h « politic , 
Rcocraphy, economics, etc.; see California State Board of Education, 1988.) It 
fiso InVolCes the understanding of the role of the historian, argument 
s rSc^SXverlflcatlon proceduTes-what Is often called process. We use 
orSural kiiowledge to desalbe rouHnes the student uses to construct 
Kers to our%artlfuIar format of measures (f ^v^^ow to wr^ «say). 
rnntPnt knowlcdce focus^'s on the e ements Inside the alscipilne, tne 
pSes con^^^^^^^^^^^ provide the manlpulable Information 

Lse We w^h further to distinguish among the ^ontribiU ons to stud^^^^^^ 
perfirmance of prior knowledge. Instruction, and the text or other text 
stimuli provided In the measure. 

A constructlvlst view of comprehension suggests ^'^^^ 
material Is Influenced explicitly by prior »^7'^?5^^"?J4^^^^^^^^^^^ 
activation of broad schemata, (Rummclhart, 1980; WIttrock, 19»U or lo "ansio;"* 
he ifng^^^^ terms, premises and viewpoints to Provide context. 

DcsSbInK patterns of prior knowledge, or mental models, and their effects of 
J^rpceheS S provides another ^^^^^^^^^^ 

assess our progiess (See Kleias, 1988; deWeer & Brown, 1983; Brown & VanLehn, 
1980; Carpenter, Moser, & Romberg, 1982.) 

Our study was Initially designed to expand the content ^"/"ty ^co.^^^^^^^^ 
for essavs In sublect matter liyond those that were commonly used In wr tten 
comSloS %aS, Freeman, & Clayton, 1989). Our efforts at the outset were to 
3pt lo?de^?IfJ 'the attributes against which ^^^^f^J^Jf^^^^^^^ 
nanrrS miffht be ludficd. To thIs end, essays were collected after llth graae 
ffits reVd eK a Lincoln or Douglas speech. These were scored by yo groups 
of exw t First teachers trained to use an essay scoring rubric principally focused 
on tea^^^^^^^ (I.e., organization, st^le) ^cojed the "say^^^^^^^^ 

crouD Of history teachers who were asked to score only the content knowledge 
f xhlblted in the essays and then to Isolate the attributes of best and worst essays. 
O^r d a showedTJe^markable degree of agreement between the two groups of 
raters succestlnc that matters of expression were swamping the detection of 
Sn Se"d^^^^^^ also had collected think aloud Protocols from teachers 
Sns and stSdents who were asked to read speeches ^^^^^^ 
P«av Question Our analyses of these essays suggested that experts reiiea neavuy on 
or 0? Sedie usually m^^^^ from a premise S? specific organizing princ pie, and 
S d t"r^maUon for illustration to'construct their arguments 
some teachers, on the other hand, attended much more specifically to the 
Dr«ented t"xt. coheren argument for comprehensiveness of detail, and 

nreS esMvs tha w^^^ less premise driven. After a scries of pi pt studies, we 
S^ve develo^d a^^^^^^^^^^^^ rubric were Incorporates the following elements: 

t use of prior knowledge 

• principles 

• tacts and events 

• problem/premise driven 

• text Information 

• Interrelationships 

• misconceptions 

In addition, an overall Impression score for content quality and for essay 
quality Is solicited. 
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The evoluUon of oui scoring sch( me required some revisions In the prompt 

S% and rSl^^^^^^ rubric. We are most Interested to see the 

mSncss S theZ?Xln ourlcoring rubric aaoss other essay topics ( .e, the 
ctsS^^^^^^^ 5uch as\earch capers, and In other subject areas 

(e.g., reports of laboratory experiments In biology). 

Our Immediate research plans call for the valldaHon of this measure, and Its 
uinitv at d ffS ace levels. We are also refining general spedfica Ions for 
Sres ofprlor^k^^^^^^^^^ We also wish to test the generallzablllly of our scoring 
dimensions aaoss topics. 

Our hooe Is to develop a scale that will assess multiple aspects of higher 
order co^ tlveWo?^^^^ that can be adapted to partlcufar contexts. For 
?n1^tanSCrscak in ght be specially weighted to emphasize the IncorporaHon of 
instance, ine J^-aie imgm J , / acaulsllion or learning to learn were the 
m%"o^s SnTeTtSe ht W were used to^ssess gerjeral history 

EiowIedS more ernphasls would be placed on organizing premises, prior 
k^oS^nd m3cept!ons. We are also In the process of exp oring the use of 
K?P^f afa way of dlre^ assessing students' representation of content 
knowledge In essay planning. 

Our approach to deep understanding was selected for practical Practice as 
well as thJorSlcal reasons: (a) extended written responses P'O^Jde opportunit Mo 
obi rve moTe d^^^^^^ the products of complex thinWng Pr^^^^ 

CUWon 1989) 0??1^ ^ocised on developing stimulus materials 

fharwoild (a) mee standards of aedlblllty for historians and historv teachers, and 
(£) sirulate InsS^^^ exposure (I.e., the long passages to be read). 

Creating new measures such as our history task Is a valuable undertaking, 
nartlcularlv when the scoring scheme for the task results In economies and 

Understanding Outcomes: Educational Indicators 

The metaohor of Indicator systems has been adapted from the field of 
economlJ^ (Murnane & 1987^ Baker & Herman, 1985). /"dlator systems 

Sine data Yn relatively simplified models Injprove und^^^^^ 

r«S t? S?houXmany^^^^^^^^^ attempted to describe Indicator systems (sec 
olkes 1986 fofa rndamcntal treatment o( the topic In education), Uidlcator 
fvs^eL iwlcallv Indude^^ of Inputs, processes, and outputt. Inputs con ^st 
S th^ch^ ctXto of sM^ the sodoeconomfcs of the neighborhood, and the 
?haractfris«S oHe^^^ Processes consist of curricula and Instructional 
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Chafacterlsllcs of Fndlcatot Systems 

What are some altribu.cs of Indicator systems ™Yn«'^^^^^ <"'"«""8 '° 
apply to the problems In the measurement oi higher order thinking? 

An indicator may be a composite of m^^^^^^^^^^ 
SrVp'c'alf/X? me els ^"d a weak .ojojledg. base. 

«tt.?s". t ™rs?Idi!if^rK,;^fd'£^!SsSd!^ of 

?,;rd^S Ian t'mo"JroHS& b^Toary achievement tests. 

Tndiraiors are reoorted In context. Appropriate reporting of any Indicator 

Indicators would be similarly tomp'«. <»Y''"'"^^ student mobility 

allocations of ^ff"" di.ng« In ^^^^^^ ^ "capfu s?Pfi>rt, and class sl« 

robust across site and program differences. 

Indicators derive their "«'"|"8 '"/'"^^^ 
longitudinal and geneu) estimate of "he h„ th <f «™-7„7 ~*,e;,,„g 

programs were '"Pg","^ '°/Xl%S^^O™^^^^ peTtomiancts of the 
?Srs/f.em r.ne^»;^^^^ 

changes over time (Is It "« "P " ^°ri''SXXr^^^^^ hl«h" o'"" 

?Was'£l j;^«»°2t?dl^^^ 'i-a'tty- 

Indlca'ors foster a moi« participatory educaUonal environment There 
should tS redu^ InStlve to foSl with any of the particular measures used to 
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create the composite since any one measure would »>«J«s ''»^ely »o ^^^^^ 
Anf^niy value Therefore, soedflcatlons and comparable forms for Individual 
STes maW^^^ '"d'cator could be widely available One feature 

S an IdM^^^^^^ be the opportunity for local schools, or even classrooms, of 

teachers and of students to record Sieir particular Instructional emphases, both to 
descVlbl ?he experiences of students and to help Interpret other data. TTie benefit 
orinS?aton Is C^^^^^^^ combined Information would still be relatively 

?Ly tSrstand f^pollcy and pibllc consumers. Schools wou d be encouraged to 
rse oartSr measures as professional aids to the Improvement of mtructlon, I. a 
hypStheS ^« ^"PP^'^ Intellectual Involvement 

and Interest of teachers. 

Context for Higher Order Thinking: An Indicators Example 

Performance of outcome Indicators, such as measures of higher order 
thinking are TSig th^^^^ that clearly need explanatory contexts. For example, In 
the reJ^ent report of a pilot study of students' performance on open-ended 
ouXns^l^^^^^^^^^ (Ca^omia Sta^ Department of Mu^^^^^^^^^ fewer 
than half of the Students achieved a satisfactory level, and a relaOveiy smaii 
Decent ge proceed compe answers. An Explanation or Pfrformana Is that 
?rw SorSblcm types are taught In schools. Similarly, poor student performance 
can be paSXwXned^^^ curriculum Indicators such as lack of coverage of topics 
frsdioSlsV Ina^de^Sade^^^ low student enrollments i^/elevan courses 

A be^nnlng In this area Is to^explore the devdq^^^^^^ 
Indicators for use In state systems, and a team of UCIJ^RAND resewdiers "s engagea 
n deveIooln2 such curriculum Indicators In mathematics and In history In secondary 
ch(S$ C pu^se ofthls project Is to develop Interim policy ln<«cato« to assess 
heCpart of rrfo?ms In the area of curriculum standards Rather than waU'ng for 
outcome Indicators to show Improvement, curriculum Indicators are designed to 
dptecTthe wtent to which reforai Intentions have found their way nto enacted 
cuXla M suXthese indicators would provide powerful explanations for 
subsequent performance levels. 

A second goal was to provide a more comprehensive picture of the content 
class work aJid rf^^ coSrse taking pattern! At minimum, these measures 

should respond to policy changes In course requirements. 

The complexity of this project Involved Issues of how to conceptualize the 
data how to vffie any new Indicator we developed, and how to develop It In an 
fmtma?om?fo^^^^ tha't regul« data could be acqulre^^wltfjom undue 5u,den on 
the responders. Determining the effects of standards on course tawng seems 
Satlvely straightforward. Vot Instance, It Is common to report ^^ll^J^^^>Xl^. 
SXr of stuSents enrolled In courses by tlUe (e.g.. Algebra I or American Hhtory). 
A lSrob^ous problem Is that the content of » course may be ^'^^ ^^^^f^^}.. 
deMX7ui>on who teaches It. Even where attempts are made to standardized 
roKmtem in I partlc^ district, conventions for what content Is Induded In a 
'^v^ c^^^ to site. Furthemiore some school dlsWd^^^^^ 

?ourees with even less standardized content e.g., business mat^). We hoM o 
d?JeTopVn approach that could be used at many sites and to provide a standard 
frarneworkso that school, distrld, and state data on course content might be 
compared. 

At the outset, we wished to determine the Intent be^l"^ JJ'^'"^"^,*!?^^ 
hijiher course work requirements, and this task was accomplished by 'series of 
fS ?nte^eS^^ 1988) with governors' aides policy ma^ert, and 

Son S^^^ \n generil, these Interviews ^:jS^^'SS2l.''^^^ T 
more strlneent course requirements were Instituted to Improve the q«aiity ot 
Zdent pSformance McDonneU (1988) reported that policy makers described the 
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Durpose of course work reforms both In general terms, -...kid's potential not being 
t^m^J^nd ls more operational goois, '...to raise test scores..." (p. 5). 

niir second task was to attempt to find new, comprehensive ways to 
determine tre^Sntent oV particular courses. Most particularly, we were Interested 
fn rZse rS^^^ of Sifflculty. Ouf data collection Involved five types of data: 

dX ona"mXmatta conte'nt. However, 'J P??^'" ^M^sTas w H ' 

hhKh aUhouBh ins that students- overall pcJformance may Inaease 

Sse thwiiavc a mo e ^tended opportunity toVrn a fixed amount of 
Sent To oS measures of what was actually covered In classes, our pro ect 
uteyed tc°c?ers anTstudents. Another approach '"»tH'l?X",ieraTas ^ 
«iM,ni awlimments completed by the end of a course and selecting average as wen 
t^^cM^uiM^oA% post kc portfolio. Conducting topic >"» ys« '« i 
an^asklng teacS to show us •how'far- they covered provided another measure. 

Needed analyses are underway to combine 'n'!o,^^|5P?'''S,I!;'' 
either have content validity or can be shown to have constiurt validity, w'tn 

DmIoyn7rnfo"ltL S^^^^^^ sort, al.hough a T'?'*;,!i!;!,f",rol"5flSldeT 
Imtltuaonallzed, can greatly contribute to our understanding of all sortt of student 

performance. 

Summary 

This reoort explored the definition and Impetus for the mw5urement 

to nrovlde contcxt fof resu ts of new measures. A brief exarnpie oi me 
devdopmen^^ ln<"«tor$ was Included to demonstrate the 

complexity and utility of such efforts. 
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